Gpu Accelerated Parallel Branch Prediction for Multi/many-core Processor Simulation
نویسندگان
چکیده
Branch Prediction is a common function in nowadays microprocessors. Branch predictor is duplicated in each core of a multi/many-core processor and makes prediction for multiple concurrent running programs respectively. To evaluate the parallel branch prediction in a multi/many-core processor, existing schemes generally use a parallel simulator running on a CPU that does not have a real massive parallel running environment to support the simulation and thus has a bad simulating performance. In this paper, we use a real many-core platform, GPU, to perform a parallel simulation of branch prediction for the future general purpose multi/many-core processor design. We verify the correctness of the GPU based parallel branch predictor against the traditional CPU based branch predictor. Experiment result shows that the GPU based parallel simulation scheme obtains a two to ten times of speedup over the CPU platform when the issue rate ranging from one to four instructions per cycle, and it shows that the GPU based scheme is a promising way to improve the simulation speed for future multi/many-core processor research.
منابع مشابه
Parallel Branch Prediction on GPU Platform
Branch Prediction is a common function in nowadays microprocessor. Branch predictor is duplicated into multiple copies in each core of a multicore and many-core processor and makes prediction for multiple concurrent running programs respectively. To evaluate the parallel branch prediction in many-core processor, existed schemes generally use a parallel simulator running in CPU which does not ha...
متن کاملParallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU
Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA...
متن کاملRlt2-based Parallel Algorithms for Solving Large Quadratic Assignment Problems on Graphics Processing Unit Clusters
This paper discusses efficient parallel algorithms for obtaining strong lower bounds and exact solutions for large instances of the Quadratic Assignment Problem (QAP). Our parallel architecture is comprised of both multi-core processors and Compute Unified Device Architecture (CUDA) enabled NVIDIA Graphics Processing Units (GPUs) on the Blue Waters Supercomputing Facility at the University of I...
متن کاملRLT2-based Parallel Algorithms for Solving Large Quadratic Assignment Problems on Graphics Processing Unit Clusters
This paper discusses efficient parallel algorithms for obtaining strong lower bounds and exact solutions for large instances of the Quadratic Assignment Problem (QAP). Our parallel architecture is comprised of both multi-core processors and Compute Unified Device Architecture (CUDA) enabled NVIDIA Graphics Processing Units (GPUs) on the Blue Waters Supercomputing Facility at the University of I...
متن کاملTechniques for adapting industrial simulation software for power devices and networks to multi- and many-core architectures
Simulation software has been widely used in academic and industrial environments for a long time. In recent years, however, the available hardware characteristics have changed significantly and rapidly. Several years ago, true parallel processing was only available using clusters or expensive workstations, whereas today systems with multiple processor cores are well established and even mass ma...
متن کامل